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An Attempt to Find An A Priori Measure of Step Size 
Ellen F. Rosen and Lawrence M. Stolurow 

PROBLEM 

Step size in an important determiner of student performance. Although 
it may seem to be so, step size is not readily measurable. Logically, the 
most reasonable measure of step size is empirical difficulty as calculated 
from student performance, but this is an a posteriori measure. An a priori 
measure is needed. The present investigation is an attempt to find a fine 
grain predictor of empirical difficulty. 

METHOD 



Subjects an d Judges 

The judges who served as raters were ten programers from the staff 
of UlCSM. *nie subjects (students) have been described elsewhere 
(Beberman and Stolurow, 1963 , Quarterly Report 9 & 10, Ch^er VH). 

MATERIALS 

Student*s materials. The materials consisted of the two versions of 
Part 112^ of the UICSM-PIP materials (See Beberman and Stolurow^ 1963). 

^Large step size version prepared by Clark HimmeL 
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Two booklets were prepared for the students* use and were assigned 
randomlv to those available for the 5d:udv. One versirm woe noiioH i-ho J 

step version and designated 112S^ the other was the large step version and 
designated 112L. 

Both versions were given to students as learning materials under three 
conditions of use in conjunction with a teacher. In one condition, the program 
was given to the students, after v^ich the teacher covered the material. This 
was called the **lead** mode. In a second condition, the program was given to 
the students, after the teacher had covered the material. This was called the 
"follow** mode. In the third condition, called the **pure** mode, only the pro- 
gram was given to the student; the teacher did not cover the material. 

Judge*s materials^ Two booklets were prepared for the judges. Judges 1 
and Judges 2. These two books consisted of a segment from both student 
versions soihat each judge rated half of each student version. 

Procedure for judges. Judges were given one form of the judge*s 
booklets and asked to rate it according to foiur categories. A copy of the 
instructions to judges is presented in Appendix A. The instructions are self- 
explanatory. They define and illustrate the judge's task which was to relate 
pairs of adjacent steps and to rate changes in complexity on a scale from -5 
throu^ +5 on four separate characteristics: (a) the concept; (b) the vehicle; 
(c) the numeral; and (d) the response. 
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RESULTS 

The judges ratings were converted into standard scores for each category 
(Guilford, 1956, Pp. 489-494)« The standard scores for each step were then 
averaged across judges within categories and across categories and judges. 
Thus two sets of ratings were arrived at, one for each (student) booklet 
version. 

From the students' responses an empirical difficulty was calculated 
(percent of students getting all the problems on the page correct). The 
means and standard deviations for the ratings and students under the three 
different conditions of teacher presentation are presented in Table 1 and 
Table 2, respectively. 

Correlations Of Judgments With Empirical Difficulty 

Tables 3, 4, and 5 present the correlations of step size judgments 
with empirical difficulty. The judgments and empirical difficulty were 
paired by considering the difficulty of the last page of the step as the 
measure to be predicted. Thus, f jr example, each judge's ratings of the 
step from page 1 to page 2 of Part 112, was paired with the empirical difficulty 
as calculated from students' responses to the questions on page 2 of Part 112. 

It mi^t be noted here that Part 112 has more than one problem per frame. 
Consequently these data are likely to have greater reliability than those 
obtained from more conventional linear programs with only one response 
per page. 
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Table 1 



Descriptive Statistics on Judges* Ratings 
of Two Versions of Part 112 of the UICSM 
Programed Learning Materials 



Versions 


Category^ 


Mean 


Rank 


Standard 

Error 


Amount of 
Change 


Part 112S** 


Concept 


-.085 


5 


.846 




(small step) 


Vehicle 


.011 


n 

Jk 


.593 






Numeral 


.010 


2 


.654 






Response 


-. 004 


3 


.668 






Total 


017 


4 


.401 




Part 112L® 


Concept 


.172 


1 


.503 




(large step) 


Vehicle 


.008 


4 


.854 






Numeral 


0.000 


5 


.797 






Response 


.045 


3 


.696 






Total 


.056 


2 


.523 





^These categories are described in Appendix A. 

"Based on the average rating of five judges on 51 steps using a standard 
score conversion of scale values. 

^Based on tlie average rating of five judges on 32 steps using a standard 
score conversion of scale values. 



Table 2 

Distribution Statistics for Empirical Diffici^ty 
(Student’s Response) Under Three Conditions of 
Use for the Two Versions 



Version 


Conditions 
of use 


Mean 

Difficulty 


Standard 

Deviation 


112S 


(small step) 


Program Lead^ . 


78. 425 


18.703 




Program Follow” 


75. 490 


19.007 




”I^e” (Only 
Program)^ 


75. 686 


17.740 


112L 


(large step) 


Program Lead” 


78. 361 


17. 983 




Program Follow® 


76. 875 


22. 141 




"Pure” (Oply 
Program^ 


74. 023 


18.229 



%ased on sample of 11 students on 51 pages. 

^ased on sample of 8 students on 51 pages, 
c. 

based on sample of 20 students on 51 pages. 

^ased on sample of 13 students on 32 pages. 
6 

based on sample of 10 students on 32 pages. 
^ based on sample of 16 students on 32 pages. 
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Taole 3 



Correlation of Judged and Observed Step Size 
for the Condition of Use Called Pure Mode (Program Only) 



Version 


Concept # 


Vehicle # 


Numeral # 


Response # 


Total 


Part 112S 


-. 080 


-.071 


-.010 


-. 278 ** 


-.178 


(51 frames) 












Part 112L 


-.270 


-.293 


-.329 


-. 360* 


-. 429* 


(32 frames) 













♦for ^=0, r gg = • 349 for 30 df (two-sided), 

♦♦for 0, r gg s. 274 for 49 df (two-sided). 



er|c 












i 



1 






I 



1 
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Table 4 

Correlations of Judged and Observed Step Size 
for the Condition of Use Called Lead Mode (Program First) 



Version 


Concept 


Vehicle 


Numeral 


Response 


Total 


Part U2S 
(51 frames) 


-.096 


-.127 


213 


-.336** 


312** 


Part 112L 


-.248 


-.188 


419^ 


-.271 


-. 386* 



(32 frames) 



*for 0, r gg = • 349 for 30 df (two-sided)* 

♦♦for 0, 95 = • 274 for 49 df (two-sided). 



Table 5 



Correlationis of Judged and Observed Step Size 
for the Condition of the Called Follow Mode (Program Follow) 



Version 


Part 112 S 
(51 frames) 


-.031 


. 065 . 052 


293** 


1 

• 

o 

00 

CD 


Part 112L 
(32 frames) 


-.089 


-.108 434* 


175 


-• 289 


♦for H : 
0 


P=®’ >^.95 


= . 349 for 30 df (two-sided). 






**for H : 
0 


P=®> >^.95 


Se 275 for 4S df (two-sided). 







Correlations significantly different from zero at . 05 level were obtained from 
(1) the nure mode (Table 3) between ( tbe T*AQriAT1ffO rk««^ 

’ ^ ' ~ — ^ Jf * MVMA|gl9 CVLKXA 

the empirical difficulty for both the large and small step size programi^ and 
between the overall average (total) rating and difficulty for the large step 
sequence; (2) the lead mode (Table 4) between (a) tlie numeral category and 
difficulty for the large step sequence, (b) the response category and difficulty 
for the small step sequence, and (c) the average overaK^rating across 
categories for both sequences; and (3) the follow mode (Table 5) between the 
numeral category and difficulty for the large step sequence, and between the 
response category ratings and difficulty for the small step sequence. 

CONCLUSIONS 

The results of this study are not exactly clear. A quick glance at Table 2 
indicates that, in fact, the average empirical difficulty of the steps did not differ 
for the two versions within the presentation mode. This is probably due to the 
fact that the two versions were prepared before the beginning of the study. 

The large step version was generated by means of deletion of frames which 
were felt to be unnecessary. Thus, it is quite probably that the two versions 
really did not differ in terms of step size. 

This has potentially important implications for the previous studies of 
step size (Coulson and Silberman, 1960; Evans, Glaser and Homme, 1960; 

Glaser and Reynolds, 1962; Haccoby and Sheffield, 1958; IViargolius and 
Sheffield, 1961; Smith and Moore 1961. ) in ^vhich the typical method of 















ttmeyn ti.'M0w r iw » . » ^^ i t^^lMhftWiaaaw 
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manipulation has been the simple deletion or addition of frames to create the 
so-called larger step version. Present results suggest that the deletion pro- 
cedure may produce an illusion of change other than an actual change in step 
size* Certainly this simple manipulation is suspect unless step size changes 
are documented by some additional information relating to program changes 
produced by frame deletion. 

The important point of these results is that step size and number of frames 
deleted are most likely not in one-to-one correspondence; when aiming at 
increasing step size one also must consider quality (kind of material deleted) 
as well as quantity (number or amount of material deleted)* This issue of 
quantity and quality will be discussed in a report on sequential analysis of 
parts within the sequence and frames within the parts. 

The data in Tables 3^ 4 and 5 suggest that variations in difficulty probably 
could be achieved by systematic variation in the response and numeral 
characteristics of the steps. These two dimensions seem to be the most 
promising basis for changing step size. 

Contrary to the finding of Rothkopf (1963), this study has shown that judges 
can reliably estimate empirical difficulty by examining the stimulus materials* 
In part, reliability was obtained, with the present rating scale, by using 
judgments based upon changes between adjacent frames. The indices that 
seem to be most promising for this purpose are response and numeral, the 
former being somewhat more dependable (significant correlations in three out 
of four possibilities) than the latter (significant correlations in one out of 
four possibilities). 
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SUMMARY 

This study is an attempt to develop a methodology for the estimation 
of empirical difficulty under conditions in which the relative range of step sizes 
is small. The judgment of changes taking place from frame to frame were 
obtained with a standardized 10 point scale which required the judges to evaluate 
four characteristics of the stimulus materials: concept, vehicle, numeral 
and response. Judgments were obtained for a ”small-step” version 
and for the same material with some steps deleted (*large step"). The stimulus 
materials were booklets consisting of 54 and 35 frames respectively, taken, 
as a random sample from the original version of the experimerital 
edition of the VUCBM Hi^ School mathematics programed materials. 






APPENDIX 

Instructions for Judges 
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We are interested in the similarities and differences in pairs of adjacent 
pages or "learning steps" contained in the accompanying booklet of programed 
instruction, and we would like your help in finding out how much these 
adjacent pages are different from and similar to each other with regard to 
the complexity (abstractness) of certain given characteristics of the material 
present in the pages. (The pages to be Judged will be considered in serial 
order, i. e. , pages 1 and 2 will be compared, then pages 2 and 3, then pages 
3 and 4, etc. throu^ the final two pages in booklet. ) 

We want you to rate the changes in complexity (abstractness) of certain 
characteristics in going from the first page of the pair to the second page on 
a scale from -5 through +5, with a rating of zero (C) representing no change 
in the complexity of a characteristic, ratings above zero representing 
progressively increasing complexity from the first to the second page, and 
ratings below zero representing progressively decreasing complexity from 
the first to the second page, so that a rating of represents the most expreme 
change in complexity of a characteristic in either direction. If a 
characteristic is not present on either of the pages of the pair, record a 
zero (0) as your rating. 



o 

Prepared and developed by Clark Himmel to conform to the dimensional 
requirements developed in work with a program or fractions by 
L. M. Stolurow with the assi^>tance of Gaila Grubb. 
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The four characteristics that we want you to consider are (A) the Concept^ 
(B) the Vehicle, (C) the Numeral, and (D) the Response, A description of 
each of these characteristics, along with an example, and a rating guide is 
given below. 

Concept: refers to the mathematical rule, principle, idea, >r closely 

related group of rules, concepts, conventions, ideas, or 
principles in mathematics; such as, the associative principle 
of addition, or the axiomatic system in Euclidean geometry, or 
the idea of negative numbers. 

You should be looking for one of the following: Changes in 
the complexity, in levels of description or in manner of pre- 
sentation. You are to identify and rate these changes when 
leaving one concept and turning to another as they happen 
within two adjacent pages. Also, note changes in overall 
complexity when two or more concepts (or, if you prefer, ”sub- 
conceits'^ are presented simultaneously on one or both of the 
pair of pages being considered. For example, if only addition 
is presented on one page and both addition and multiplication are 
presented on the following page, the change probably is an 
increase in the complexity of this characteristic. If this occurred 
then the rating assigned to the pair of pages might be a •(•2 for the 
concept. 
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that which is used to help communicate or convey the concept 
(and the associated material) being presented by giving a con- 
crete or exemplar background or *Veal setting** to the problems 
and expository material; such as, two airplanes traveling toward 
each other in a rate of travel problem in algebra, or the ledger 
entries for a retail business in a bookkeeping problem. 

This characteristic is one v^ich may not be present on 
all program steps. Consider the vehicle **a road with mile 
markers** for presenting the idea of real numbers (both positive 
and negative), where a trip from R to B (represented 3) is a +3 




of a pair, the rating assigned would be zero (0). If it is absent 
only on the second page of the pair, the rating assigned would be 



• • 






tfWMLVMfMWMM 
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4-5. (The above assumes that no ne^v vehicle characteristics 
were introduced on either of the pages in the pair. ) If something 

\\Alci^JL€3IMJL§Jf JUVr«*CiVAVriA^ WA MCUk tp/^j^A€UA(4V«\^AA/ A»9 %AAV# YWMWAW 

or a new vehicle is introduced in going from the first page to 
the second, a rating commensurate with the accompan 3 ring 
change in complexity should be assigned. If the same material 
were deleted from the second page, a rating commensurate with 
this change should be assigned. 

refers simply to ^ symbols for or representations of numbers 
presented, by the Roman numerals, Hindu-Arabic numerals, 
or others, plus their accompanying "operators’* and ’'designators, 
such as 4*, 'r,^fX2, =, or -7, so that an entire expression like 
(4-16 -r -4) X 4-2 “ -8 would be considered under this 
characteristic. 

Consideration should be given to changes in complexity 
in the types of numerals given on the pages. This should be 
relatively straightforward, since numerals and their "operators** 
and "designators" are presented in an expli^ notation system. 
For example, a first page mi^t present addition of simple three 
caigit numerals while the next page calls for multiplication of the 
square roots of similar three digit numerals. Then the pair 
would probably receive a fairly high positive rating, perhaps a 4*3 



Response : 



refers to the particulai* answer(s) to fee chosen, constructed or 
written, or in some way indicated by the student as he finishes 
the profeiem(s) or question(s) on a page. 

Response complexity will vary due to the characteristics 
of the actual response given and due to tlie abstractness or 
difficulty of the specific question(s) or explicitly stated profelem(s) 
to be answered or solved. For example, a response that would 
be relatively complex in the UICSM Unit I material would be 
one which is constructed or written by the student; for example, 
**the associative principle of addition. A relatively less complex 
response would be choosing one of two alternatives. The second 
facet of ’’response** to be considered is the nature of the problem(s) 
or question(s) to be answered. It also can be scaled in terms of 
complexity or abstractness. A question like ”2 + 2 = ?* 
probably less complex than a long and tedious word problem 
which also requires only a single digit ansv/er. 



Each of the characteristics on the pair of steps (pages) to be compared should 
be rated with regard to the change in complexity (or abstractness in the sense of 
being abstruse, more difficult to comprehend, ideationally complex or intricat e) 
in going from one step to the next one. 
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On your rating sheets you will find the four characteristics listed as 
headings of four columns. Each pair of pages to be compared and then rated 
is listed at the left. When comparing pairs of pages, do not include the answers 
and "feedback” material (usually included between the statements ’’check your 
answers” and ’’record your results”) in your considerations for rating. We 
are interested in having you rate the ’’instructional” and ’’question” portions 
of the pages. 



Remember: 

1. Rate Changes on the scale from 

Mid-point 

+5 0 



-5 



Increased 

Complexity 



(no change) 



Decreased 

Complexity 



2. Consider the four following characteristics when rating each pair of pages: 

A. Concept 

B. Vehicle 

C. Numeral 

D. Response 

3. For each characteristic consider the amount of change in your perception. 
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APPENDIX B 
Sample Hating Sheet 



Name 



Date 



PAGES 


Concept 


Vehicle 


: Numeral ' 


Response 




1-2 












2-3 












3-4 








1 ■ 11 ~i 1 1 1 1 1 




4-5 












5-6 


1 


I 








6-7 












7-8 












8-9 












9-10 












10-11 












11-12 












12-13 












13-14 








1 




14-15 












15-16 












16-17 


t 


1 









% 



l 



'4 



i 



J 

J 

I 

I 

I 

■i 
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Name 



Date 



PAGES 


Concept 


Vehicle 


Numeral 


Response 




17-lB 












18-19 












19-20 












20-21 


• 










21-22 












22-23 












23-24 












24-25 












Hu-2G 












20-27 












27-28 












28-29 












29-30 












30-31 












31-32 






( 


1 

¥ 




32-33 












33-34 












34-35 












35-36 
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Name 

Date 



PAGES 


Concept 


Vehicle 


Numeral 


Response 


- 


36-37 










> 


37-38 












38-39 












39-40 












41-42 












42-43 












43-44 












44-45 












45 
















X 
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36-37 
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37-38 
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41-42 












42-43 












43-44 












44-45 
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